Visual analysis of viseme dynamics

نویسنده

  • Aseel Turkmani
چکیده

Face to face dialogue is the most natural mode of communication between humans. The combination of human visual perception of expression and perception in changes in intonation provides semantic information that communicates idea, feelings and concepts. The realistic modelling of speech movements, through automatic facial animation, and maintaining audio-visual coherence is still a challenge in both the computer graphics and film industry. A common approach to producing visual speech is to interpolate parameters that describe mouth variation in sequence, known as visemes. A viseme corresponds to a phoneme in an utterance. Most talking head systems use sets of static visemes, represented by a single mouth shape image or 3D model. However, discretising visemes in this way does not account for context-dependent dynamic information, coarticulation. This thesis presents several visual analysis and dynamic modelling techniques for visual phones. This spans several areas of work, from capture and representation through to analysis and synthesis of speech movements and coarticulation. A novel method is reported for the automatic extraction of inner-lip contour edges from sequences of mouth images in speech. The proposed detection technique is a key-frame exemplar-based method that is not dependent on any prior frame information for intitialisation allowing for reliable and accurate inner-lip localisation for large frame to frame changes in lip-shape inherent in 25Hz video of visual speech. Visual analysis of phonemes in continuous speech is performed, that involves the investigation of mouth representations as well as a comparative analysis between static and dynamic representations of visemes. The analysis shows the need to analyse and model the underlying dynamics of visemes due to coarticulation. Finally, visual analysis of lip coarticulation in Vowel-Consonant-Vowel (VCV) utterances is presented. Based on ensemble statistics a novel approach to analysis and modelling of temporal dynamics is presented. Results show that the temporal influence of coarticulation is significant both in lip shape variation and timings of lip movement during coarticulation. This work shows that the effect of temporal variation due to coarticulation is statistically significant and should be taken into account in modelling visual speech synthesis. The work in this thesis provides the foundation for further research towards achieving perceptually realistic animation of a talking head and the understanding of visual dynamics of shape and texture during speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis

The use of visemes as atomic speech units in visual speech analysis and synthesis systems is well-established. Viseme labels are determined using a many-to-one phoneme-to-viseme mapping. However, due to the visual coarticulation effects, an accurate mapping from phonemes to visemes should define a many-to-many mapping scheme. In this research it was found that neither the use of standardized no...

متن کامل

Viseme comparison based on phonetic cues for varying speech accents

Human interaction through speech is a multisensory activity, wherein the spoken audio is perceived using both auditory and visual cues. However, in the absence of auditory stimulus, speech content can be perceived through lip reading, using the dynamics of the social context. In our earlier work [1], we had presented a tool enabling hearing impaired to understand spoken speech in videos, throug...

متن کامل

Phoneme-to-viseme Mapping for Visual Speech Recognition

Phonemes are the standard modelling unit in HMM-based continuous speech recognition systems. Visemes are the equivalent unit in the visual domain, but there is less agreement on precisely what visemes are, or how many to model on the visual side in audio-visual speech recognition systems. This paper compares the use of 5 viseme maps in a continuous speech recognition task. The focus of the stud...

متن کامل

Automatic Viseme Clustering for Audiovisual Speech Synthesis

A common approach in visual speech synthesis is the use of visemes as atomic units of speech. In this paper, phonemebased and viseme-based audiovisual speech synthesis techniques are compared in order to explore the balancing between data availability and an improved audiovisual coherence for synthesis optimization. A technique for automatic viseme clustering is described and it is compared to ...

متن کامل

Modeling Continuous Visual Speech Using Boosted Viseme Models

In this paper, a novel connected-viseme approach for modeling continuous visual speech is presented. The approach adopts AdaBoost-HMMs as the viseme models. Continuous visual speech is modeled by connecting the viseme models using level building algorithm. The approach is applied to identify words and phrases in visual speech. The recognition results indicate that the proposed method has better...

متن کامل

Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading

Speech is the most common communication method between humans and involves the perception of both auditory and visual channels. Automatic speech recognition focuses on interpreting the audio signals, but it has been demonstrated that video can provide information that is complementary to the audio. Thus, the study of automatic lip-reading is important and is still an open problem. One of the ke...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008